NetWorker: How to Debug Backup Operations
Summary: Several options are listed for debugging a failed NetWorker Backup.
Instructions
Log Files:
The principle logs for debugging backup failures are the policy log files which are at the following location.
Linux: /nsr/logs/policy_name/workflow_name/action_name
Windows: ..Program FilesEMC NetWorker srlogspolicy_nameworkflow_nameaction_name
There are workflow log files in the raw format under /nsr/logs/policy/policy_name/workflow_name/jobid.raw and a subdirectory for each action. Each child action of an action has its own log file with the jobid of that child job. When the parent action starts a child action, NetWorker creates a directory for these child action logs.
Example:
Here we can see the location of the policy logs and that the logs are of different sizes depending on the debug level that is used during the backup. The raw files are the workflow logs, while the backup_[jobid]_logs directories contain the action logs and child action logs.
The main NetWorker log file for all NetWorker operations is the daemon.raw log file.
This is located in [NetWorker_install_dir]logs.
Windows: C:Program FilesEMC NetWorker srlogs
To read this log, you use the nsr_render_log command.
Example:
Further Resources:
503582 : NetWorker log files and how to collect for analysis
469489 : NetWorker List of Logs to Collect
457094 : Log files and information to collect and provide to support for general NetWorker issues
NetWorker Command Reference Guide
Save on the NetWorker Client
NetWorker client-based backups use the save process. The save process communicates with the NetWorker server, storage node (where applicable), or target backup device media. Debug can be enabled on the save process by passing the -D debug flag to the save process using either the NetWorker Management Console (NMC) or using he nsradmin command.
In the NMC, you change the 'Backup command' field in the relevant client properties to 'save -D9':
Example:
You can do the same operation using the nsradmin command:
Example:
Alternatively, on a linux system, you can use the printf command to make this nsradmin change in one line:
Example:
printf "show
. type : NSR Client; name : vm-lego-231; save set : /alice
update backup command : save -D9
" | nsradmin -i -
Further Resources:
NetWorker Command Reference Guide
How to Use NetWorker nsradmin validation checking
Special Uses for the NetWorker nsradmin program Technical Note
Workflow Operation on the NetWorker Server
Debugging the start of a workflow operation and detailed debug output is needed.
nsrworkflow -D9 -p [policy] -w [workflow]
This logs the workflow job debug output to the raw file in:
/nsr/logs/policy/policy_name/workflow_name/
Example:
Running the nsrworkflow command initiates the job manually but use the same scheduling and level configuration options that are used as a scheduled automated backup. Another possibility is to use the -a flag to define the nsrworkflow run as an adhoc backup which allows to override the backup schedule or level. To specifiy the backup level that you want (not what is set for today's run of the workflow), you use the -l (or -L for virtual machine backups).
Example:
nsrworkflow -p [policy] -w [workflow] -A "'[action]' -l [level]" -a
nsrworkflow -p Mona -w Bokonon_wf -A "'backup' -l full" -a
Further Resources:
516616 : How to use the NetWorker nsrworkflow command
513030 : How to use the NetWorker nsrpolicy command
NetWorker 9.1.x Release Notes:
NetWorker Command Reference Guide
Savefs on the NetWorker Client
The savefs command is used during client-based backups. It is sent to the NetWorker client after the backup is initiated on the NetWorker server. savefs is this process which is responsible for determining the files and directories to back up for this specific backup run on this client.
You can obtain the exact savefs command which is being run on the client side from the raw file in the policy logs (/nsr/logs/policy/[policy name]/[workflow name]). Then run this on the client side, adding the -D9 option:
Example:
On the NetWorker server:
And then on the client side:
Further Resources:
NetWorker Command Reference Guide
Assigning Target Media on the NetWorker Server
The assignment of the correct target volume for a backup is managed by the nsrd process on the NetWorker server. To debug this, you must temporarily increase the debug level of the nsrd process on the NetWorker server using the dbgcommand.
Example:
After debugging is completed, you must turn off the debugging like so:
Further Resources:
Backups Waiting for Writeable Volume
If the NetWorker server cannot find a suitable NetWorker volume to write to, it will stop responding and generate an alert. In this case, the job will be in the 'active' state. You can check the state of the job using the nsrpolicy monitor command.
Example:
The alert in the NetWorker Management Console gives more details on what type of volume is being sought and on which Storage Node.
Example:
Further Resources:
Backups unexpectedly stopped responding due to parallelism
If the NetWorker server determines that it cannot continue with the backup because there is no free parallelism slot. In this case, the job is in the 'queued' state.
In order to debug the parallelism, you need must increase the debug level of the nsrjobd process on the NetWorker server as shown below. The daemon log file outputs a lot of debugging data relative to parallelism.
Example:
Further Resources:
NetWorker Performance Optimization Planning Guide
Parallelism and Target Sessions
Client Direct backup not working as expected
A "Client direct" backup sends data directly from the NetWorker client to the target media without first writing to the NetWorker Storage Node.
You can define in the client properties whether client direct backup should be used or not for this client instance.
In order to troubleshoot whether client direct is working or not, you must inspect the logs as per the below example:
Example:
Log output: Client direct in operation.
Daemon log file on the NetWorker server:
91787 08/01/2014 01:37:35 PM nsrmmd NSR notice Save-set ID '4091251191' (vm-lego-231:/NetWorker) is using direct file save with Data Domain device 'dd4500-dd.local_onetwoone'.
lsof on the NetWorker client
[root@vm-lego-231 ~]# lsof -i TCP | grep save
save 9831 root 3u IPv4 111668 0t0 TCP vm-lego-231:23178->vm-lego-121:8985 (ESTABLISHED)
save 9831 root 5u IPv4 111695 0t0 TCP vm-lego-231:19752->vm-lego-121:9417 (ESTABLISHED)
save 9831 root 7u IPv4 111720 0t0 TCP vm-lego-231:31095->vm-lego-121:9035 (ESTABLISHED)
save 9831 root 8u IPv4 111728 0t0 TCP vm-lego-231:12421->vm-lego-121:9653 (ESTABLISHED)
save 9831 root 9u IPv4 111731 0t0 TCP vm-lego-231:33739->dd4500-dd.local:nfs (ESTABLISHED)
save 9831 root 10u IPv4 111736 0t0 TCP vm-lego-231:60278->dd4500-dd.local:midnight-tech (ESTABLISHED)
Note: We can see that there are open TCP connections from the client both to the NetWorker server and to the DD. If you need to know which processes exactly on the NetWorker server are connected to, you can cross-check with lsof on the server. The fourth column is the file descriptor being used.
On a windows system, you could see similar output by using resmon: Start - Run - resmon - Network tab - TCP Connections
Log output: Backup is not using client direct.
Daemon log file on the NetWorker server:
91797 08/01/2014 01:57:51 PM nsrmmd NSR severe Unable to perform direct file save with Data Domain device 'ONETWOONE'; setting up traditional save for save-set ID '4024143566' (vm-lego-231:/NetWorker)
Note: Looking for the word traditional in the log gives you this output quickly. If you need to find out why it is not using client direct, start with the NetWorker Administration Guide's list of conditions that need to be met for client direct to work. The most common reasons would be that the client has no direct network access to the DD from the NIC it is using or that the name resolution is not working correctly from the client.
lsof on the NetWorker client:
[root@vm-lego-231 ~]# lsof -i TCP | grep save
save 10114 root 3u IPv4 123335 0t0 TCP vm-lego-231:46461->vm-lego-121:8985 (ESTABLISHED)
save 10114 root 5u IPv4 123369 0t0 TCP vm-lego-231:12593->vm-lego-121:9417 (ESTABLISHED)
save 10114 root 7u IPv4 123392 0t0 TCP vm-lego-231:63952->vm-lego-121:9035 (ESTABLISHED)
save 10114 root 8u IPv4 123400 0t0 TCP vm-lego-231:29597->vm-lego-121:9653 (ESTABLISHED)
Note: Only TCP connections to the NetWorker Server (which is also the Storage Node in this example) are open here. There is no TCP connection open to the DD. All the data is going to the Storage Node.
Further Resources:
NetWorker Performance Optimization Planning Guide
Parallel Save Stream Backups
To debug PSS backups. Ensure that the 'parallel save stream' property is ticked in the client resource in the NetWorker Management Console. Modify the save command to put it in debug as per number 1 above. Also, create an empty file in ../nsr/debug called 'mbsdopen'. This provides extra debug logging both on the client in /nsr/tmp and in the policy logs on the NetWorker server (see number 1 above).
Example:
Further Resources:
How to Troubleshoot NetWorker Parallel Save Stream backups
NetWorker Performance Optimization Planning Guide
NetWorker Storage Node nsrmmd process not working as expected as it writes to the target media.
You can increase the debug level of the nsrmmd processes using the dbgcommand (described in number 7 above). You can either increase the debug level of all the nsrmmd processes or else use operating system tools to identify which nsrmmd process is active:
Further Resources:
479665 : Triage Article: Troubleshooting Tape Library Problems in NetWorker
NetWorker Data Domain Boost Integration Guide
Additional Information
Other Debugging Tips for Specific NetWorker Technologies:
-
Tuning NetWorker Server for Optimum Performance
-
NVP-vProxy: How to enable debug logging
-
How to test NetWorker client-server communication through a firewall
-
How to troubleshoot NetWorker Scheduled Cloning failures
-
NetWorker Troubleshooting Guide: Process Crashes and Core Dumps
-
NetWorker NMC 9.x: How To How to enable the debug logs
-
How to enable debug for NMDA
-
NMM Detailed Troubleshooting Guide
-
How to debug recover job failures from NMC
-
NDMP Triage Guide
-
479591 : Reclaiming space from Data Domain devices triage guide











